Diseases outbreak dataset exploration

by Ahmed Hanafi

Preliminary Wrangling

What is the structure of your dataset?

What is/are the main feature(s) of interest in your dataset?

the dataset has diseases incedence and there human and animal infections or deathes

What features in the dataset do you think will help support your investigation into your feature(s) of interest?

the data include location and and could help mab out feature

Univariate Exploration

humans_affected distribution with log scale

let's dig deeper

most of the records of human affected if existed is 1 which is a good thing but most of this column is null and I think null mean Zero so I filled it with Zero there is also outliers where we have around 500 humans affected which raise a concern we need to see which diseases cause that

human_age distribution with log scale

most of human's cases are less than 1 year age or they are Zero because most of them was null

human_deaths distribution with log scale

the mode of humans_deaths if existed is 1, some times there is high number of deaths which raise concern we need to see which diseases cause that and most of the column is null which I think mean Zero so I filled it with Zero

sum_cases of animals distribution with log scale

let's dig deeper

let's dig deeper

most of sum_cases of animal is less than 10 and the mode is 1 which is a good thing but there is cases where it is too large at 800K and I think those cases are in poultry, we need to see which diseases cause that

sum_destroyed animals distribution with log scale

let's dig deeper

let's dig deeper

sum_destroyed sometimes reach realy high numbers as 3.66 million which raise concerne and I think those cases are in poultry, we need to see which diseases cause that, and I think those nulls mean Zero so I filled them with Zero

sum_slaughtered animals distribution with log scale

let's dig deeper

let's dig deeper

most of the column is null and I think it mean Zero as the mode but there is high outliers reaching near a million which need more investigation

diseases distribution

the highest count is for Avian_influenza flowed by African swine fever and viral diseases are the highest incidence

country distribution

the highest countries by order are Poland, China and that is expected, Romania, France, Indonesia, Greece, and Egypt

region distribution

Europe has the most records flowed by Asia

Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

there were high outliers which need more investigation and I did not perform any changes but the columns' names

Bivariate Exploration

disease vs sum humans_affected

disease vs sum humans_deaths

sum humans_affected through months

case fatality per disease for human

diseases vs sum of animal cases

let's dig without poultry diseases and African swine fever

diseases vs sum of animal deaths

let's dig without poultry diseases and African swine fever

diseases vs sum of animal slaughtered

let's dig without poultry diseases and African swine fever

diseases vs sum of animal destroyed

let's dig without poultry diseases and African swine fever

poultry diseases dominate the numbers followed by African swine fever and other diseases of pig and ruminant

sum_cases per month

there are two spikes in the year 2006 and 2012

case fatality per disease for animals

the highest case fatality is for Schmallenberg followed by Anthrax

country vs humans_affected sum

country vs human_deaths sum

human_case_fatality per country

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

No

Multivariate Exploration

sum_cases through time for different diseases

as we can see the spikes are caused by poultry diseases mostly

animal cases and deaths per disease

some diseases have vey high cases and deaths as Avian Influenza and Newcastle disease while most have both of them low but MERS-CoV have has zero deaths for animal desbite it's high fatality in humans

human cases and deaths per disease

sum humans_affected through time for different diseases

human cases and deaths per country

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

Were there any interesting or surprising interactions between features?